Crawl Me Maybe: Iterative Linked Dataset Preservation

نویسندگان

  • Besnik Fetahu
  • Ujwal Gadiraju
  • Stefan Dietze
چکیده

The abundance of Linked Data being published, updated, and interlinked calls for strategies to preserve datasets in a scalable way. In this paper, we propose a system that iteratively crawls and captures the evolution of linked datasets based on flexible crawl definitions. The captured deltas of datasets are decomposed into two conceptual sets: evolution of (i)metadata and (ii)the actual data covering schema and instance-level statements. The changes are represented as logs which determine three main operations: insertions, updates and deletions. Crawled data is stored in a relational database, for efficiency purposes, while exposing the diffs of a dataset and its live version in RDF format.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian and Iterative Maximum Likelihood Estimation of the Coefficients in Logistic Regression Analysis with Linked Data

This paper considers logistic regression analysis with linked data. It is shown that, in logistic regression analysis with linked data, a finite mixture of Bernoulli distributions can be used for modeling the response variables. We proposed an iterative maximum likelihood estimator for the regression coefficients that takes the matching probabilities into account. Next, the Bayesian counterpart...

متن کامل

Explicit and Implicit Schema Information on the Linked Open Data Cloud: Joined Forces or Antagonists?

Schema information about resources in the Linked Open Data (LOD) cloud can be provided in a twofold way: it can be explicitly defined by attaching RDF types to the resources. Or it is provided implicitly via the definition of the resources’ properties. In this paper, we analyze the correlation between the two sources of schema information. To this end, we have extracted schema information regar...

متن کامل

Comparing Topic Coverage in Breadth-First and Depth-First Crawls Using Anchor Texts

Web archives preserve the fast changing Web by repeatedly crawling its content. The crawling strategy has an influence on the data that is archived. We use link anchor text of two Web crawls created with different crawling strategies in order to compare their coverage of past popular topics. One of our crawls was collected by the National Library of the Netherlands (KB) using a depthfirst strat...

متن کامل

An Improved Non-Iterative Privacy Preservation Lotteries

In 2009, a non-iterative privacy preservation for online lotteries is proposed in IET Information Security by J.S lee, C.S Chan and C.C Chang [1], who claim their scheme achieve the following properties:  Privacy. No one can learn the choices made by lottery players except the players themselves.  Security. No one can counterfeit a winner or forge a winning lottery ticket to claim the prize. ...

متن کامل

WebIsALOD: Providing Hypernymy Relations Extracted from the Web as Linked Open Data

Hypernymy relations are an important asset in many applications, and a central ingredient to Semantic Web ontologies. The IsA database is a large collection of such hypernymy relations extracted from the Common Crawl. In this paper, we introduce WebIsALOD, a Linked Open Data release of the IsA database, containing 400M hypernymy relations, each provided with rich provenance information. As the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014